Vorhersage (Klassifikation) betrügerischer Kontotransaktionen

Solution Engineering in R

Daniel Borsos, Valerie Högerle, Michaela Hubweber, Florian Ye

2024-05-28

Milestone 3: Modellierung

  step     type   amount oldbalanceOrg newbalanceOrig oldbalanceDest
1    1  PAYMENT 3.993023      5.230799       5.204926       0.000000
2    1  PAYMENT 3.270744      4.327359       4.287482       0.000000
3    1 TRANSFER 2.260071      2.260071       0.000000       0.000000
4    1 CASH_OUT 2.260071      2.260071       0.000000       4.325987
5    1  PAYMENT 4.067039      4.618623       4.475480       0.000000
6    1  PAYMENT 3.893135      4.731274       4.663166       0.000000
  newbalanceDest isFraud change.balanceOrg increase.balanceDest flagFraud
1              0      No           9839.64                    0        No
2              0      No           1864.28                    0        No
3              0     Yes            181.00                    0       Yes
4              0     Yes            181.00               -21182       Yes
5              0      No          11668.14                    0        No
6              0      No           7817.71                    0        No

Train-Test Split

Verteilung von 'isFraud' im Trainingsdatensatz:

    No    Yes 
0.9987 0.0013 

Verteilung von 'isFraud' im Testdatensatz:

    No    Yes 
0.9987 0.0013 

Verteilung der Variablen 'type' und 'isFraud' im Trainingsdatensatz:
          
                No     Yes
  CASH_IN  0.21992 0.00000
  CASH_OUT 0.35102 0.00065
  DEBIT    0.00651 0.00000
  PAYMENT  0.33815 0.00000
  TRANSFER 0.08311 0.00064

Verteilung der Variablen 'type' und 'isFraud' im Testdatensatz:
          
                No     Yes
  CASH_IN  0.21992 0.00000
  CASH_OUT 0.35102 0.00065
  DEBIT    0.00651 0.00000
  PAYMENT  0.33815 0.00000
  TRANSFER 0.08311 0.00064

Baseline Modell

[1] "Konfusionsmatrix:"
          Reference
Prediction      No     Yes
       No        0       0
       Yes 1191449    1539
[1] "Recall: 1"
[1] "Precision: 0.00129003812276402"
[1] "F1-Score: 0.00257675213703834"
[1] "AUC Score: 0.5"

Modell Training

Decision Tree

[1] "Konfusionsmatrix:"
          Reference
Prediction      No     Yes
       No  1191449       9
       Yes       0    1530
[1] "Recall: 0.994152046783626"
[1] "Precision: 1"
[1] "F1-Score: 0.997067448680352"
[1] "AUC Score: 0.997076023391813"

Logistische Regression

[1] "Konfusionsmatrix:"
          Reference
Prediction      No     Yes
       No  1191449       7
       Yes       0    1532
[1] "Recall: 0.99545159194282"
[1] "Precision: 1"
[1] "F1-Score: 0.997720612178443"
[1] "AUC Score: 0.99951259844789"

Naive Bayes

[1] "Konfusionsmatrix:"
          Reference
Prediction      No     Yes
       No  1188355      34
       Yes    3094    1505
[1] "Recall: 0.977907732293697"
[1] "Precision: 0.32724505327245"
[1] "F1-Score: 0.490387748452265"
[1] "AUC Score: 0.985471432580127"